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ABSTRACT 


Amemiya's  estimator  is  a  weighted  least  squares  estimator  of  the  regression  coefficients  in  a  linear 
model  with  heteroskedastic  errors.  It  is  attractive  because  the  heteroscedasticity  is  not  parameterised 
and  the  weights  (which  depend  on  the  error  covariance  matrix)  are  estimated  nonparametrically.  In 
this  paper,  we  obtain  an  asymptotic  expansion  for  Amemiya's  form  of  the  weighted  least  squares 
estimator.  We  use  this  expansion  to  discuss  the  effect  of  estimating  the  weights,  the  effect  of  the 
number  of  iterations  and  the  effect  of  the  choice  of  the  initial  estimate.  We  also  discuss  the  special 
case  of  normally  distributed  errors  and  clarify  the  special  consequences  of  assuming  normality. 


1.  Introduction 

Econometric  modelling  is  frequently  complicated  by  heterogeneous  variability  in  the 
stochastic  component  of  the  model.  Such  heteroscedasticity,  arises  in  almost  all  fields;  for 
examples  see  Carroll  and  Ruppert  (1988).  It  is  always  possible,  of  course,  to  ignore  the 
heteroscedasticty  and  proceed  with  a  standard  analysis,  but  substantial  gains  in  efficiency  are 
possible  if  we  incorporate  information  about  the  heteroscedasticity  into  the  analysis.  One 
approach  is  to  model  the  heteroscedasticity  by  introducing  an  explicit  parametric  model  for  the 
scale  of  the  stochastic  component  of  the  model.  This  approach  has  been  explored  in 
considerable  detail;  again  see  Carroll  and  Ruppert  (1988)  for  a  recent  survey.  It  can,  however, 
be  prohibitively  difficult  to  parametrize  heteroscedasticity.  In  practice  purely  empirical  models 
are  difficult  to  identify,  and  there  may  be  no  theoretical  motivation  for  a  particular  structural 
model.  Economic  theory  is  rich  in  models  for  conditional  means  but  meagre  as  a  source  of 
models  for  scale.  In  this  paper,  therefore,  we  will  consider  an  approach  suggested  by 
Amemiya  (1983)  which  attempts  to  deal  with  heteroscedasticity  without  introducing  an  explicit 
parametric  model.  This  approach  is  closely  allied  with  the  work  of  Eicker  (1963),  and  White 
(1982)  on  consistent  covariance  matrix  estimation  and  Chamberlain  (1982)  on  method  of 
moments  estimation. 

Consider  the  heteroscedastic  linear  model 
y  =  Xp  +  ii,  (l.l) 

where  X  is  an  nxp  matrix  of  known  constants  with  rows  denoted  x. ,  j  =  l,...,n,  (3  is  a  p-vector 
of  unknown  parameters  and  u  =  (ui,...,un)T  is  an  n-vector  of  independent  random  variables 

*\  *>  *1  A 

with  Euj  =  0,  Eu.  =  a- ,  Eu-  =  Ji3j   and  Eu.  =  H4j  <  °°  .    The  regression  parameter  (3  is  the 

2         2 
parameter  of  interest  while    I  =  diag(o\ a  )  is  regarded  as  an  arbitrary  n  dimensional 

nuisance  parameter.  In  the  classical  linear  model  we  take  ui,...,un  to  be  identically  distributed 
so  that  I  =  a2I. 

The  weighted  least  squares  estimator  is  widely  used  for  estimating  the  regression 
parameter  in  heteroscedastic  linear  models.  Notice  that  when  £  is  known,  premultiplying  (1. 1) 
by  2>l/2  yields  a  classical  linear  model  for  which  the  least  squares  estimator  is 


1 


When  I  is  unknown  p£  cannot  be  computed  but  we  may  be  able  to  substitute  an  appropriate  1 
for  L  to  obtain 

(3£=(XTI-1X)-1XT£-1y. 

Since  L  is  not  parametrized,  an  appropriate  I  is  obtained  by  setting  I  =  diag(rlt...,rn),  where 

r  =  Y  -  X  (3(0)  is  the  vector  of  residuals  from  an  initial  estimator  P(0)  of  p.  Notice  that  I  is  in 
fact  an  estimator  of  diag(u^ u *)  rather  than  of  I.  Although  we  actually  need  to  estimate 

iHxTl-^X  and  irlXTIrly  rather  than  I,  it  turns  out  that  we  cannot  estimate  n^XTE-^X 
unless  we  can  estimate  I.  As  we  have  only  n  observations  with  which  to  estimate  the  n 
parameters  in  £  we  cannot  construct  a  consistent  estimator  of  I.  However,  there  is  a 
convenient  reformulation  of  Pi  which  enables  us  to  overcome  this  difficulty.  Let  V  be  an 
nx(n-p)  matrix  of  constants  such  that  (X,V)  is  a  nonsingular  nxn  matrix  and  VTX  =  0.  i.e.  the 
columns  of  V  are  orthogonal  to  those  of  X.  If  we  let  K(X)  denote  the  subspace  of  Rn 
spanned  by  the  rows  of  X  and  ft(X)1  denote  its  orthogonal  complement,  we  have  trivially  that 
K  (X-l/2X)i-  =  K  (X)-L  =  il  (V)  =  K  (L^V).  Now  I  -  L-l/2X(XTL-lX)-lXTE-l/2 
projects  E"  onto  (JKL-^X)^  and  L^VC^I^r^Z^  projects  Rn  onto  flld^V)  and 
this  projection  is  unique  (e.g.  Seber,  1977,  p394),  so  we  have  I  -  IrV2X(XTIrlX)-WZr1'2 
=     Il/2V(VTZV)-1VTI1/2.  Thus 

pZ  =  (XTX)-lXTy-  (XTX)-lXTll/2{I  -  l-^X(XJ^X)^X^-^}i:-l^y 
=  (XTX)-lXTy-  (XTX)-lXTll/2{El/2V(VTEV)-lVT"ll/2}I-l/2y 

=  (XTX)-l{XT-XTlV(VTZV)-lVT}y 

=  pi  -  (X^-^TIVCVTIV)-1  VTy 

I 

which  involves  2  rather  than  2K   At  least  so  far  as  analysis  is  concerned,  there  is  a  slight 

further  difficulty  caused  by  the  fact  that  the  dimensions  of  n"1  VTIV  and  irlXTZV  increase 
with  n.  Amemiya  (1983)  therefore  suggested  that  we  replace  V  by  an  nxq  matrix  W  with  a 
fixed  q  <  n-p  of  the  columns  of  V.  Replacing  I  by  £,  we  obtain  Amemiya's  estimator 


P(i)  =  ft-  (XTX)-1XT£w(WT£w)-lWTy.  (1.2) 

If  I  =  a2I  is  known,  P(i)  =  Pi.  It  is  obvious  that  in  replacing  V  by  W  we  are  neglecting  some 
of  the  structure  of  I.  Nonetheless,  Amemiya  (1983)  showed  that  this  estimator  is  always  more 
efficient  than  the  least  squares  estimator  Pi  and  Balestra  (1983)  showed  that  it  can  be  as 
efficient  as  pi;  in  particular,  if  there  are  only  q  different  variances  in  I,  a  judicious  choice  of  W 
makes  Amemiya's  estimator  equal  to  p£ .  The  general  issue  of  how  to  choose  W  has  not  been 

addressed.  Nor  has  the  possibility  of  allowing  q  to  diverge  to  infinity  at  a  slower  rate  than  n. 
These  interesting  issues  are  beyond  the  scope  of  the  present  paper  and  will  not  be  pursued  here. 
Our  purpose  is  rather  to  obtain  an  expansion  for  P(i)  which  enables  us  to  examine  the  effect  of 
using  £  rather  than  I  in  the  estimator,  the  effect  of  the  number  of  iterations  and  the  effect  of  the 
choice  of  the  initial  estimate.  This  work  complements  that  of  Carroll,  Wu  and  Ruppert  (1988) 
and  Rothenberg  (1984)  on  the  effect  on  weighted  least  squares  of  fitting  parametric  models  for 
I  and  extends  that  of  Fuller  and  Rao  (1978)  on  the  replicated  case  by  relaxing  the  assumption 
of  normal  errors. 

2.  Theoretical  Results 

Our  main  result  is  a  higher  order  asymptotic  expansion  for  p(i)  including  terms  of  order 

n-3/2  in  probability.  The  expansion  requires  conditions  on  various  sums  and  matrices 
involving  X,  W  and  the  moments  of  the  uj's  which  are  stated  in  the  Section  3.  We  also  require 
a  condition  on  the  initial  estimator  p(0).  In  particular,  we  suppose  that  P(0)  satisfies 

p(0)  -  p  =  n-lCSD^u)  +  Op(n-l/2),  C2-D 

for  some  pxp  nonsingular  matrix  C  =  0(1),  some  nxp  matrix  D  and  some  vector  function  ¥(u) 
=  (\|/(ui),...\|/(un))T,  where  n-lC"1DT4/(u)  =  Op(n"1/2).    It  is  convenient  to  set 
A  =  n-1  WTIW  and  M  =  X  -  n"1  WA"1  VVTlX. 

Then  we  show  in  Section  3  that 

P(1)  _  (}  =  n-l/2zln  +  n-lZ2n  +  ir^nC P(0)  -  P)  +  Op(n-3/2),  (2.2) 

=  n-^Zm  +  n-lZ2n  +  n-3/2z3n(n-1C-1D^(u))  +  Op(n"3/2), 


where  ir^Zin  =  (XTX)-JMTu  and  Zm  =  Op(l),  t  =  1,2,3.  Here  Z3n(-):IRP  ->  Rp  is  a 
function  of  the  initial  estimator  whereas  Zin  and  Z2n  are  not.  If  L  were  known,  we  would  have 
the  identity 

p(D  -  (3  =  n-l/2Zln  =  (XTX^MTu,  W 

and 

Var  p(i)  =  n-lEZmZ[n 

-  Varpi  -  (X^-^TIWA-1  WZXCXTX)"1. 

which,  incidentally,  proves  that  Var  n'^Zin  ^  Varpj.  When  X  is  unknown,  we  have  (2.2) 
and,  preceding  formally,  the  moment  expansions 

EP(1)-P  =  n-lEZ2n  +  o(n-l) 
and 

Var  p(i)  =  n-lEZmZ^  +  n-2T(P(0)  -  P)  +  o(n"2),  (2.3) 

where 

T(  P(0)  -  P)  =  EZ2nZln  +  EZinZ^  4-  EZinZ^  -  EZ^EZ^ 

+  EZ3n(P(0)  -  P)z[n  +  EZinZ3n(P(0)  -  P)T- 

It  is  instructive  to  write  T  =  Ti  +  T2(p(0)  -  P),  where  T2O)  is  a  function  of  p(0)  -  P  and  Ti  is  not.  I 
follows  from  the  results  in  Section  3  that 

Ti  =  -  n-l(XTX)"1  X  mimf  wT A-1wj  (M4j  -  ahoWr1  (2-4) 

|if 

n     n         _     _ 
T,   T, 


+  3  n-2(XTX)-l  I    S  mjm(wA-lwk)2  Wj  Wk(XTX)-l 
j=lk=l  J 


n     n 


+  2n-2(XTX)-l  I    I  man   w  A-*wj  w/A"lwk  Wj  Wk(XTX)-l 
j=lk=l        J     J 


and 

T2(n-1C-1DT4/(u))  =  4n-2(XTX)~1  I  mimf  wJA"1  WTGDC-lx;  o2  (XTX)~l        (2.5) 

j=1        J     J  J 

+  4n-2(XTX)"1  I  m;  xf  (wTa^ws  c2-  rr^C-WGW ' A-^ACr^^GMiX^X)^ , 
l_i    J    J       J  J    j  j  J 

where  G  =  diag(Euiy(ui),...tEunv^(un)). 

It  is  perhaps  worth  noting  that,  with  considerable  work,  higher  order  terms  in  the  above 
expansions  could  be  obtained.  However,  the  above  expansions  contain  sufficient  terms  to 
capture  the  dominant  effect  of  the  initial  estimator.  Moreover,  Carroll,  Wu  and  Ruppert  (1988) 
found  that  the  conclusions  drawn  from  examining  expansions  of  this  order  seem  to  reflect,  at 
least  qualitatively,  the  findings  from  simulation  studies. 

The  contribution  of  the  initial  estimator  to  Amemiya's  estimator  (1.2)  is  of  order  n-3/2  in 
probability  and  affects  the  second  term  in  the  expansion  of  the  asymptotic  variance  of 
Amemiya's  estimator.  We  can  iterate  the  procedure  by  using  P(i)  as  a  new  initial  estimator, 
calculating  J3(2)  etc.  Identifying  ir^Zin  =  (XTX)-*MTu  with  n-1C-1DTvP(u),  nC  =  XTX,  D 
=  M  and  *F(u)  =  u,  we  find  that  for  c  >  2,  (2.2)  becomes 

(3(c)  -  p  =  n-l/2Zm  +  n-lZ2n  +  n-^Z3n(n-^ZXn)  +  Op(n"3/2), 

and  (2.3)  becomes 

Var  p(c)  =  n-lEZmZ^  +  n-2{T!  +  T2(n-WZln))  +  o(n-2), 

Thus  iteration  reduces  the  contribution  of  the  initial  estimator  to  a  smaller  order  than  n-^/2  in 
probability  and  the  first  two  terms  of  the  asymptotic  variance  stabilise  after  two  iterations. 
Carroll,  Wu  and  Ruppert  (1988)  obtained  a  similar  result  when  the  parametric  model  for  Z  does 
not  depend  on  X(3  but  that  an  extra  iteration  is  required  to  achieve  stability  when  the  model  for 
Z  depends  on  X(3. 

It  is  not  always  straightforward  to  draw  general  conclusions  from  the  expansions  (2.4) 
and'  (2.5)  so  it  is  worth  considering  the  simple  special  case  that  the  Ui's  are  identically 
distributed  with  a  symmetric  distribution  so  that  2  =  o2I,  Ji3j  =  0  and  U4j  =  \M-  Notice  that 
here  we  are  examining  the  consequences  of  proceeding  as  though  we  had  a  heteroscedastic 
model  when  in  fact  we  do  not  In  this  case  M  =  X  and  (2.4)  and  (2.5)  become 


Ti  =  -  (m  -  o4)  n-l(XTX)-l  I  xjx]"  wTa-Wj  (JXTX)-1  (2.6) 

and 

T2(n-iC-lDTvF(u))  =  4a2Euiy(ui)n-2(XTX)-l  I  xjx]"  { wjA-lWTfcCSxj  (2.7)    |j 

+  C-^TXwTa-1  wj  }  (XTX)"1 
-  4  {EuiV(ui)}2  n-3(XTX)"l  J  XjxJc-lDTX  x[c-1DTWA-1wj  (XTX)-1. 


Interestingly,  Carroll,  Wu  and  Ruppert  (1988)  found  that  using  the  least  squares 
estimator  Pi  as  the  initial  estimator  reduced  the  number  of  iterations  for  the  covariance  to 
stabilise  by  one  in  each  case,  this  is  not  in  general  true  for  Amemiya's  estimator.  However, 
when  the  uj's  are  identically  and  symmetrically  distributed  the  least  squares  estimator  pi 
satisfies  (2.1)  with  nC  =  XTX,  D  =  X  and  \j/(u)  =  u  and,  for  c  >  2,  P(c_i)  satisfies  (2.1)  with 
nC=  XTX,  D  =  M  =  X  and  \y(u)  =  u  so  that 

T2(Pl  -  P)  =  T2(P(c-l)  -P)=4o4  n-l(XTX)-!  J  xjxJwJa-Wj  (XTX)"\  c  >  2.  (2.8) 

Thus  in  this  particular  case,  starting  with  the  least  squares  estimator  results  in  a  stable 
covariance  after  only  one  iteration. 

Carroll,  Wu  and  Ruppert  (1988)  show  further  that  there  may  be  advantages  to  using  a 
robust  initial  estimator.     Suppose  we  use  the  M-estimator  p*   obtained  by  solving 

X  xj\i/((yj  -  xTpyco)  =  0,  where  a>  is  a  consistent  estimate  of  some  scale  functional  co  which 
j=l  J 


need  not  equal  a.   If  the  uj's  are  identically  distributed  with  a  symmetric  distribution,  p* 
satisfies  (2.1)  with  nC  =  co"lE\/(ui/co)XTX  and  D  =  X  so  that 

T2(p*  -  P)  =  4  a*   EU1¥(U1/C0)    n-l(XTX)-l  I  xjXyw[A-lwj  QC^rK  (2.9) 

©-^^(ui/co)  j=l 


I 


Since  Y  x;xTwTA-1wi  is  nonnegative  definite,  a  comparison  of  (2.8)  and  (2.9)  shows  that 
j=l    J  J    J  J 

P(l)  based  on  an  M-estimator  has  a  smaller  covariance  (up  to  terms  of  order  n~2)  than  P(i) 


based  on  the  least  squares  estimator  or,  indeed,  on  the  iterated  stable  estimator  P(C),  c  >  2, 
whenever 

Emy(ui/G))  <  o^oriEvj/Xui/co).  (2.10) 

Note  that  more  generally  when  the  uj's  have  arbitrary  symmetric  distributions  p*  satisfies  (2. 1) 
with  nC  =  ©-lXTdiag(E\j/,(ui/CD),...,E\y,(un/cD))X  and  D  =  X  so  that  we  can  write  down 

expansions  for  this  case  too.  Moreover,  we  can  also  drop  the  symmetry  assumption  but  at  the 
cost  of  a  slightly  more  sophisticated  analysis. 

We  can  also  examine  the  effect  of  including  £  in  our  analysis  when  it  is  not  actually 
required  in  the  identically  distributed  symmetric  case.  Since 

T(p*-p)=T1+T2(p*-P) 

=  -  {  M4  -  o4  -  4  a*   E"1¥(U1/Q))     }  n-l(XTX)-l  I  xjxy w V wj  (W 
oHEyXui/a))  j=l  J  J    J         J 

and 

T(pi-p)=T(P(c_1)-p)  =  -{u4-5o^}n-kXTX)-l  Ixjxyw[A-lwj(XTX)-l,  c  >  2, 

we  see  that  for  near  normal  distributions  with  k  =  u-4/04  <  5,  including  £  when  it  is  not  actually 
required  casues  an  increase  in  the  covariance  compared  to  when  £  =  a2I  is  known.  However, 
for  long-tailed  distributions  with  k  >  5,  including  £  actually  reduces  the  covariance  (up  to 
0(n~2))  compared  to  when  £  =  a2I  is  known.  The  same  result  was  found  in  Carroll,  Wu  and 
Ruppert  (1988).  One  possible  explanation  is  that  when  we  have  long-tailed  distributions  we 
obtain  some  large  residuals  and  weighted  least  squares  estimators  downweight  the  observations 
corresponding  to  these  residuals  so  that  we  actually  get  a  kind  of  robustness  effect. 

Finally,  consider  the  particular  case  where  u  has  a  multivariate  normal  distribution. 
Rothenberg  (1984)  has  examined  the  special  case  where  £  depends  on  a  finite  dimensional 
parameter  9  which  is  not  a  function  of  p.  He  assumes  that  £  is  formed  from  estimates  0 
which  are  even  functions  of  u  and  also  do  not  depend  on  p.  Given  the  closure  of  the 
multivariate  normal  distribution  under  linear  transformations,  this  last  condition  implies  that  6 
is  an  even  function  of  the  projection  of  y  onto  the  orthogonal  complement  of  the  column  space 
of  X.  That  is,  the  initial  estimator  P(Q)  will  be  of  the  form  P(0)  =  (XTQX)~1XTQy,  where  Q  is 


an  arbitrary  positive  definite  matrix  not  depending  on  (3  and  X  is  any  matrix  which  spans  the 
column  space  of  X.  Then  0  is  obtained  as  an  even  function  of  the  resulting  residuals.  He 
found  that  including  X  increases  the  covariance  compared  to  when  Z  is  known,  that  the  number 
of  iterations  and  the  choice  of  Q  do  not  matter.  Note  that  for  normal  ui  an  integration  by  pans " 
implies  that  Eui\y(ui/co)  =  a2co_1E\|/'(ui/a))  and  so  (2.10)  cannot  hold.  But  in  non-normal 
models,  however,  choosing  \\f  so  that  (2. 10)  holds,  we  can  actually  decrease  the  covariance  (up 
to  n~2)  compared  to  when  L  is  known.  Moreover,  even  if  we  restrict  attention  to  linear  initial 
estimators  we  find  that  the  number  of  iterations  does  matter.  Here  on  setting  X=  X,  the  most 
plausible  choice  for  X,  (3(0)  =  (XTQX)"lXTQy  satisfies  (2.1)  with  nC  =  XTQX,  D  =  QX  and 
\\f(u)  =  u  so  G  =  E  and  (2.5)  becomes 

T2((XTQX)-1XTQu)  =  4n-1(XTX)"1  £  mjmT  wTA-1WtIQX(XtQX)-1xj  a2  (XTX)"1 

j=1        J     J  J 

+  4n-I(XTX)~1  I  mjxTwTA^Wi  a2  (XTQX)~1XTQIM(XTX)-1 
j=1       J     J  J 

-4n-l(XTX)-!  I  mjxT(XTQX)-lXTQIMxy(XTQX)-1XTQIWA-lwj  (XTX)~l 
j=l       J  J 

which  depends  on  Q.   However,  in  the  identically  distributed  case  (L  =  o~2I),  the  number  of 

iterations  and  the  choice  of  Q  do  not  matter  as  M  =  X  and  (2.7)  becomes 

n 
T2((XTQX)"1XTQu)  =  4a4  n^X^X)"1  I  xjx{  wj A-1w;(XTX)-1 

j=l      J     J 

which  does  not  depend  on  Q. 


3.   Proofs 

In  this  section  we  give  a  formal  proof  of  the  expansion  (2.2),  obtain  expressions  for 
Zm,  t  =  1,2,3,  and  then  calculate  formally  the  moments  which  appear  in  Ti  and  T20- 

To  prove  (2.2)  suppose  that  (2.1)  holds  and  that,  with  M  =  X  -  W(WTIW)-IWTZX, 
i)  n-1XTX  and  n~lWTLW  converge  to  nonsingular  limits, 
and 

ii)  iHxTZX  =  0(1),  iHwTIX  =  0(1), 

n-1  £  lw:WTl  lx:l2  =  0(1),  n-1  1 1  m:wT|  |Xjl2  =  0(1) 

j=l         J  j=l  J 


« 


n-1 1  (WiwThwiW?)  H4j  =  0(1) 


n 


n-1  I  (m:w.  )*(miw. )  H4j  =  0(1) 
j=l      J    J         J   J 


n-1  I  wjW [w£wj*  a*  =  0(1),    1  <  k,l  <  q,  . 

.pi 

n-l  2  xjX Tw^mj*  0?  =  0(1),       1  <  k  <  q,  1  <  1  <  p, 

hold  (Here  *  denotes  the  Hadamard  product  of  two  matrices.) 
First  note  that  as  WTX  =  0,  we  can  write 

J3(i)  =  pi  -  (XTX)-1XT2W(WT£w)r1WTY 

=  p  +  (XTX)"l{XT-  xTswovTLwy-iwTju. 


(3.1) 


To  preserve  notation  let 


and 


Gi  =  diagtiij  -  G\ un  -  o^) 

G2  =  diag(uix]"(p  -  p),...,unx^(p  -  P)) 
G3  =  diag({x],(p-p)}2,...,{x;(p-p)}2). 


Then,  squaring  the  residuals,  we  obtain 

n-lX^IW  =  iHxTIW  +  n-lXTdiag(r^  -  a*...,r*  -  <#W 


=  n-lXTIW  +  n-lXTGiW  -  2n"lXTG2W  +  n"lXTG3W 


and,  similarly, 


(3.2) 


n-lWTIW  =  n-lWTEW  +  n-lWTGiW  -  2nr1WTG2W  +  ttWGjW. 

Notice  that  when  A  -  A  =  Op(n~1/2)  we  have 

A"1  =  A-1  -  A-k  A  -  A)A"!  +  A"H  A  -  A)A-»(  A  -  A)A"!  +  Op(n"3^) 

so  that  with  A  =  ir"lWT£w  and  A  =  n"1WTEWf  we  obtain 

n(WTZW)-1  =  A-l  -  A-ln-lW^GiW  A"1  +  A-ln-lWTGiWA"lWTGi WA"1        (3.3) 
+  2A-1  W^WA"1  -  A-lrrlWTG3WA-1  +  Op(n"3/2). 


Substituting  (3.2),  (3.3)  and  (2.1)  into  (3.1)  yields 
P(l)  -  P  =  Zm  +  Z2n  +  Z3n(  p  -  P)  +  Op(n"3/2)' 

where,  i  I 

Zln  =  (XTX)-lMTu  =  0p(n-l/2) 

Z2n  =-  ir1(XTX)-1MTGiWA-1WTu  =  Opfa"1) 

and 

Z3n(P(0)  -  P)  =  n-2(XTX)-lMTGiWA-l\VTGiWA-1WTu 

+  2n-l(XTX)-lMTG2WA-lWTu  -  ir»(XTXHMTG3WA-lWTu 

=  n-2(XTX)-1MTGiWA-1WTGiWA-1WTu 

+  2n-2(XTX)-lMTdiag(uix]'c-1DvF(u),...,UnxTC-1DvP(u))WA-lWTu 

-n-2(XTX)-1MTdiag(x'[c-1DlF(u),...,x^C-1DvF(u))2WA-1WTu 
=  Op(n-3/2). 

Now  writing  DT  =  (di,...,dn),  G  =  diag(Eui\y(ui),..„Eun\l/(un))  and  proceeding  formally, 
EZin  =  0; 

EZinZ[n  =  (XTX)-lMXM(XTX)-1; 

EZ2n  =  -  n-l(XTX)-l  S  mjwlA-lwj  ji3j; 

j=l        J 

EZ2nZ}n  =  -  n-l(XTX)-l  £  mjm]'  wJa-^wj  (^j  -  a?)(XTX)-i 

EZ2nzIn  =  n-2(XTX)-l  £    I  mjm][  w]a-1  wj  wjA-^k  Jj.3j  H3k(XTX)-* 
zn  j=lk=l  J 

+  n-2(XTX)"l  I     I  mjm*  (w[A-l  wk)2  Wj  ^(XTX)"1 
j=lk=l  J 

+  n-l(XTX)-l  I  mjmy  w?A-l  wj  (M4j  -  aSp^Xjr1  +  0(n-3) 
j=l        J     J  J 
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EZ3nZyn  =  n-2(XTX)-l  1    I  mjmT  WjTA"l  wk  wjA-lwk  Wj  Wk(XTXH 

+  n-2(XTX)"l  2    I  mjra^  (wTa-Iwj^  Wj  Wk(XTX)-l 
j=lk=l  J 

+  2n"2(XTX)-1  I  mjmT  w^A"1  W^GDC-^j  a2  (XTX)-! 
j=1        J     J  J 

+  2n-2(XTX)"1  I  mjXy  {wyA^wjof-n-^yC^DTGWA-lwj  JCSDTGMQCTX)-1 
j=l       J       J  J    J  J 

+  0(n-3), 

as  A-1\VTZM  =  A-iWTZX  -  n^A^WTlWA^WTlX  =  A^WTlX  -  A^V/TlX  =  0. 
Finally, 

EZ^  +  EZinzJn  +  EZ2nz£n  -  EZ2nEZjn+  EZ3n(P  -  W]n  +  EZlnZ^n(p  -  (3) 

=  -  n-l(XTX)-!  2  mjmT  w[a-Iwj  (mj  -  a? )(XTX)-l 
j=l        J     J  J 

+  3n-2(XTX)-l  I    2  mX  (wTa-IwO2  Wj  Wk(XTX)-l 
j=lk=l  J 

+  2n-2(XTX)-l  I    I  mjmT  wfA-lwj  w^A"^  Wj  W^X)-* 
j=lk=l        J     J 

+  4n-2(XTX)"1  X  mimy  wyA^WTGDCSxj  a?  (XTX)-* 
j=l        J     J  J 

+  4n-2(XTX)-l  X  mixy{wyA-lwia2-n-lxTC-1DTGWA-lwj}C-1DTGM(XTX)-l. 

i=l  J         J  J  J 

4.  Numerical  Results 

We  performed  a  limited  simulation  experiment  to  examine  some  of  the  predictions  of  the 
asymptotics,  the  results  of  which  are  presented  in  Table  1.  Using  a  sample  size  of  50  we  fitted 
a  linear  regression  through  the  origin  with  X  -  N(0,  25)  and  the  coefficient  on  x  set  to  unity. 
Although  not  reported,  other  sample  sizes  were  examined  with  improved  performance, 
measured  in  terms  of  mean  squared  error,  as  the  sample  size  increased  and  inferior  performance 
for  smaller  sample  sizes.  The  disturbance  term  had  zero  mean  but  its  distribution  differed  from 
case  to  case.  The  M-estimator  chosen  as  an  initial  estimator  was  that  proposed  by  Huber 
(1964)  with  c,  using  the  notation  of  Amemiya  (1985,  equation  2.3.2),  chosen  to  be  1.345. 


Some  experimentation  suggested  that  the  results  obtained  were  relatively  robust  to  the  choice  of 
c.  In  constructing  the  weighted  least  squares  estimator,  W  was  chosen  (initially)  to  be  the  first 
column  of  Px  =  In  -  X(XTX)_1XT.  In  what  follows  we  shall  denote  the  iterated  weighted 
least  squares  estimator,  using  ordinary  least  squares  as  an  initial  estimator,  by  Pis;  pm  shall  ' 
denote  its  analogue  based  on  the  M-estimator.  This  notation  supresses  the  number  of  iterations 
used  in  the  estimation  process.  In  Table  1,  mean  squared  errors  are  reported  for  estimators 
involving  one  through  five  iterations,  inclusive.  All  results  are  based  on  1000  replications. 

As  a  bench-mark  we  can  compare  the  performance  of  (3is  and  (3m  when  the  disturbances 
of  the  model  are  u;  ~  N(0,  1),  i  ■  1,  ...,  n  (experiment  1),  and  when  u\  ~  t(5),  i  =  1,  ...,  n 
(experiment  2).  In  both  experiments  the  disturbances  are  homoscedastic.  In  the  latter 
experiment  k  =  9  and,  as  predicted,  (3m  performs  better  than  pis  although,  as  in  experiment  1, 
there  is  little  to  choose  between  them.  One  common  feature  of  the  two  sets  of  results  is  that 
nothing  appears  to  gained  by  iterating.  Indeed  mean  squared  error  seems  to  increase  with  the 
number  of  iterations.  There  was  some  evidence  to  suggest  that  the  mean  squared  error 
converged  to  some  finite  value,  usually  within  four  to  seven  iterations. 

Table  1  about  here 


Experiments  3  and  4  repeat  the  first  two  experiments  but  with  uj  ~  N(0,  i),  i  =  1, ...,  n 
and  uj~  i1/2vj,  vj  ~  t(5),  i  =  1,  ...,  n,  respectively.  That  is,  these  experiments  consider 
heteroscedastic  models  with  the  scale  of  the  disturbance  increasing  with  the  index.  The  most 
noticeable  feature  of  these  results  is  the  dramatic  decline  in  the  performance  of  the  estimators 
relative  to  that  for  the  homoscedastic  models.  In  experiment  3  we  see  that,  for  k  <  5,  there 
remains  little  to  choose  between  the  two  estimators.  In  contrast,  the  results  of  experiment  4 
suggest  that  as  the  error  distribution  becomes  increasingly  leptokurtotic  there  are  benefits  in 
using  a  robust  initial  estimator.  '  ™ 

As  indicated  in  the  introduction,  no  effort  has  been  devoted  finding  the  optimal  W  for 
the  estimator  although  Balestra  (1983)  has  shown  that  in  certain  special  situations  there  may 
exist  such  a  choice.  Nevertheless  some  investigation  of  the  effect  of  different  choices  for  W 
was  made  by  using  different  columns  of  Px  in  the  construction  of  the  estimators.  The  worst 
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case  that  was  found  is  presented  as  experiment  5.  It  is  evident  from  the  results  the  performance 
of  both  estimators  is  dramatically  worse  than  for  the  other  experiments.  Further,  the  mean 
squared  errors  are  oscillating  quite  violently.  While  not  entirely  understood,  it  may  be  that 
these  results  are  driven  by  the  inversion  of  an  ill-conditioned  matrix,  there  is  enough  evidence 
to  suggest  that  these  weighted  least  squares  estimators  are  sensitive  to  the  choice  of  W.  This 
remains  a  topic  for  further  research. 

*  The  authors  would  like  to  thank  Trevor  Breusch,  Jose  Machado  and  Terry  O'Neill  for 
helpful  discussions.  The  usual  caveat  applies. 

References 

Amemiya,  T.,  1983,  Partially  generalised  least  squares  and  two-stage  least  squares  estimators, 
Journal  of  Econometrics  23,275-283. 

Amemiya,  T.,  1985,  Advanced  Econometrics  (Basil  Blackwell). 

Balestra,  P.,  1983,    A  note  on  Amemiya's  partially  generalised  least  squares,    Journal  of 
Econometrics  23,  285-290. 

Carroll,  R.J.  and  Ruppert,  D.,  1988,  Transformation  and  weighting  in  regression  (Chapman 
and  Hall,  New  York). 

Carroll,  R.J.,  Wu,  C.F.J,  and  Ruppert,  D.,  1988,    The  effect  of  estimating  weights  in 
weighted  least  squares,  Journal  of  the  American  Statistical  Association  83,  1045-1054. 

Chamberlain,  G.,  1982,     Multivariate  regression  models  for  panel  data,     Journal  of 
Econometrics  18,  5-46. 

Eicker,  F.,  1963,  Asymptotic  normality  and  consistency  of  the  least  squares  estimators  for 
families  of  linear  regressions,  Annals  of  Mathematical  Statistics  34, 447-456. 

Fuller,  W.A.  and  Rao,  J.N.K.,  1978,  Estimation  for  a  linear  regression  model  with  unknown 
diagonal  covariance  matrix,  Annals  of  Statistics  6,  1 149-1 158. 

Huber,  P.J.,  1964,  Robust  estimation  of  a  location  parameter,  Annals  of  Mathematical 
Statistics  35,  73-101. 

Rothenberg,  T.J.,  1984,    Approximate  normality  of  generalised  least  squares  estimates, 
Econometrica  52,811-825. 

Seber,  G.A.F.,  1977,  Linear  regression  analysis  (Wiley,  New  York). 

White,  H.,   1982,     Instrumental  variables  regression  with  independent  observations, 
Econometrica  50,483-499. 


Table  1 
Estimated  Mean  Squared  Errors 


Experiment 

Estimator 

Iterations 

1 

2 

3 

4 

5 

1 

Pis 

Pm 

0.8170 
0.8266 

0.8433 
0.8512 

0.8666 
0.8743 

0.8859 
0.8936 

0.9020 
0.9087 

2 

Pis 

Pm 

1.5151 
1.4588 

1.6075 

1.5883 

1.6799 
1.6731 

1.7342 
1.7336 

1.7738 
1.7765 

3 

Pis 

Pm 

20.4400 
19.8888 

21.9701 
21.8939 

23.6265 

23.7231 

24.9826 
25.1056 

26.0440 
26.1486 

4 

Pis 

Pm 

31.9520 
27.6794 

31.7733 
30.0238 

32.7435 
31.9873 

33.7856 
33.4609 

34.6158 
34.5188 

5 

Pis 

Pm 

3522.9696 
4014.5814 

1087.7951 
603.4256 

4694.9841 
3788.3785 

222.9984 
428.9702 

4663.7089 
4107.2634 

